Goto

Collaborating Authors

 asymptotically optimal



Improving the Knowledge Gradient Algorithm

Neural Information Processing Systems

The knowledge gradient (KG) algorithm is a popular policy for the best arm identification (BAI) problem. It is built on the simple idea of always choosing the measurement that yields the greatest expected one-step improvement in the estimate of the best mean of the arms.








Finite-TimeAnalysisofRound-Robin Kullback-LeiblerUpperConfidenceBoundsfor OptimalAdaptiveAllocationwithMultiplePlaysand MarkovianRewards

Neural Information Processing Systems

Forouranalysis wedevise several concentration results forMarkovchains, including amaximal inequality for Markov chains, that may be of interest in their own right. As a byproduct of our analysis we also establish asymptotically optimal, finite-time guarantees for the case of multiple plays, and i.i.d.